Information extraction from multimedia web documents: an open-source platform and testbed
نویسندگان
چکیده
منابع مشابه
BEAT: An Open-Source Web-Based Open-Science Platform
With the increased interest in computational sciences, machine learning (ML), pattern recognition (PR) and big data, governmental agencies, academia and manufacturers are overwhelmed by the constant influx of new algorithms and techniques promising improved performance, generalization and robustness. Sadly, result reproducibility is often an overlooked feature accompanying original research pub...
متن کاملInformation extraction and imprecise query answering from web documents
Word based searches for relevant information from texts retrieve a huge collection and burden the user with information overload. Ontology based text information retrieval can perform concept-based search and extract only relevant portions of text containing concepts that are present in the query or those that are semantically linked to query concepts. While these systems have better precision ...
متن کاملNutch: an Open-Source Platform for Web Search
Nutch is an open-source project providing both complete Web search software and a platform for the development of novel Web search methods. Nutch is built on a distributed storage and computing foundation, such that every operation scales to very large collections. Core algorithms crawl, parse and index Web-based data. Plugins extend functionality at various points, including network protocols,...
متن کاملRefinery: An Open Source Topic Modeling Web Platform
We introduce Refinery, an open source platform for exploring large text document collections with topic models. Refinery is a standalone web application driven by a graphical interface, so it is usable by those without machine learning or programming expertise. Users can interactively organize articles by topic and also refine this organization with phrase-level analysis. Under the hood, we tra...
متن کاملInformation Extraction from Template-Generated Hidden Web Documents
The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (such as Google and Yahoo). Databases dynamically generate a list of documents in response to a user query – which are referred to as Hidden Web databases. Such documents are typically presented to users as templategenerated Web pages. This paper presents a new approa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Multimedia Information Retrieval
سال: 2014
ISSN: 2192-6611,2192-662X
DOI: 10.1007/s13735-014-0051-2